BL-WoLF: A Framework For Loss-Bounded Learnability In Zero-Sum Games
نویسندگان
چکیده
We present BL-WoLF, a framework for learnability in repeated zero-sum games where the cost of learning is measured by the losses the learning agent accrues (rather than the number of rounds). The game is adversarially chosen from some family that the learner knows. The opponent knows the game and the learner’s learning strategy. The learner tries to either not accrue losses, or to quickly learn about the game so as to avoid future losses (this is consistent with the Win or Learn Fast (WoLF) principle; BL stands for “bounded loss”). Our framework allows for both probabilistic and approximate learning. The resultant notion of BL-WoLF-learnability can be applied to any class of games, and allows us to measure the inherent disadvantage to a player that does not know which game in the class it is in. We present guaranteed BL-WoLF-learnability results for families of games with deterministic payoffs and families of games with stochastic payoffs. We demonstrate that these families are guaranteed approximately BL-WoLF-learnable with lower cost. We then demonstrate families of games (both stochastic and deterministic) that are not guaranteed BL-WoLF-learnable. We show that those families, nevertheless, are BL-WoLFlearnable. To prove these results, we use a key lemma which we derive.1
منابع مشابه
A TRANSITION FROM TWO-PERSON ZERO-SUM GAMES TO COOPERATIVE GAMES WITH FUZZY PAYOFFS
In this paper, we deal with games with fuzzy payoffs. We proved that players who are playing a zero-sum game with fuzzy payoffs against Nature are able to increase their joint payoff, and hence their individual payoffs by cooperating. It is shown that, a cooperative game with the fuzzy characteristic function can be constructed via the optimal game values of the zero-sum games with fuzzy payoff...
متن کاملNo-Regret Learnability for Piecewise Linear Losses
In the convex optimization approach to online regret minimization, many methodshave been developed to guarantee a O(√T ) bound on regret for subdifferentiableconvex loss functions with bounded subgradients, by using a reduction to linearloss functions. This suggests that linear loss functions tend to be the hardest onesto learn against, regardless of the underlying d...
متن کاملOn Repeated Zero-Sum Games with Incomplete Information and Asymptotically Bounded Values
We consider repeated zero-sum games with incomplete information on the side of Player 2 with the total payoff given by the non-normalized sum of stage gains. In the classical examples the value VN of such N-stage game is of the order of N or √ N as N → ∞. Our aim is to present a general framework for another asymptotic behavior of the value VN observed for the discrete version of the financial ...
متن کاملSimple Characterizations of Potential Games and Zero-sum Games
We provide several tests to determine whether a game is a potential game or whether it is a zero-sum equivalent game—a game which is strategically equivalent to a zero-sum game in the same way that a potential game is strategically equivalent to a common interest game. We present a unified framework applicable for both potential and zero-sum equivalent games by deriving a simple but useful char...
متن کاملRobust commitments and partial reputation
How should agents shape a finite “reputational" history to their advantage knowing that others will be learning from that history? We focus on one leader interacting with multiple followers and show that in most non-zero-sum games, the traditional Stackelberg mixed commitment is very fragile to observational uncertainty. We propose robust commitment rules that anticipate being learned and show ...
متن کامل